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MACHINE TRANSLATION AND TELECOMMUNICATIONS SYSTEM 
USING [USER ID] CONTROL DATA TO SELECT LANGUAGES AND DICTIONARIES 

SPECIFICATION 

Technical Field 

This invention relates to a system for automatic (machine) 
translation of text and, more particularly, to a telecommunications- 
based system for automatically translating and sending text from a 
sender to a recipient in another language. 

Background Art 

After several decades of development, the field of 
automatic (machine) translation of text from a source language to a 
target language with a minimum of human intervention has developed to 
a rudimentary level where machine translation systems with limited 
vocabularies or limited language environments can produce a basic 
level of acceptably translated text. Some current systems can 
produce translations for unconstrained input in a selected language 
pair, i.e., from a chosen source language to a chosen target 
language, that is perhaps 50% acceptable to a native writer in the 
target language (using an arbitrary scale measure) . When the 
translation system is constrained to a particular vocabulary or 
syntax style of a limited area of application (referred to as a 
"sublanguage"), the results that can now be achieved may approach a 
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level 90% acceptable to a native writer. The wide difference in 
results is attributable to the difficulty of producing acceptable 
translations when the system must encompass a wide variability in 
vocabulary use, syntax, and expression, as compared to the limited 
5 vocabularies and translation equivalents of a chosen sublanguage. 

One example of a machine translation system limited to a 
specific sublanguage application is the TAUM-METEO system developed 
by the University of Montreal for translating weather reports issued 
10 by the Canadian Environment Department from English into French. 
O TAUM-METEO uses the transfer method of translation, which consists 
j.x basically of the three steps of: (1) analyzing the sequence and 
morphological forms of input words of the source language and 
U determining their phrase and sentence structure, (2) transferring 
(,£5 (directly translating) the input text into sentences of equivalent 
p words of the target language using dictionary look-up and a developed 
set of transfer rules for word and/or phrase selections; then (3) 
*,j synthesizing an acceptable output text in the target language using 
developed rules for target language syntax and grammar. TAUM-METEO 
20 was designed to operate for the English- French language pair in the 
narrow sublanguage of meteorology (1,500 dictionary entries, 
including several hundred place names; input texts containing no 
tensed verbs) . It therefore can obtain high levels of translation 
accuracy of 80% to 90% by avoiding the need for any significant level 
25 of morphological analysis of input words, by analyzing input texts 
for domain- specif ic word markers which narrow the range of choices 
for output word selection and syntax structure, and by using ad hoc 
transfer rules for output word and phrase selections. 

30 Another example of a sublanguage translation system is the 
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METAL system developed by the Linguistics Research Center at the 
University of Texas at Austin for large-volume translations from 
German into English of texts in the field of telecommunications. The 
METAL system also uses the transfer method, but adds a fourth step 
5 called "integration" between the analysis and transfer steps. The 
integration step attempts to reduce the variability of output word 
selection and syntax by performing tests on the constituent words of 
the input text strings and constraining their application based upon 
developed grammar and phrase structure rules. Transfer dictionaries 
10 typically consist of roughly 10,000 word pairs. In terms of 
j translation quality, the METAL system is reported to have achieved 
h between 45% and 85% correct translations. 



[;i A strategy competing with the transfer approach is the 

fl5 "interlingua" approach which attempts to decompile input texts of a 
h source language into an intermediate language which represents their 
H "meaning" or semantic content, and then convert the semantic 
«ij structures into equivalent output sentences of a target language by 
% using a knowledge base of contextual, lexical, and syntactic rules. 
20 Historically, transfer systems lacking a comprehensive knowledge base 
and limited to translation of sentences in isolation have had the 
central problem of obtaining accurate word and phrase selections in 
the face of ambiguities presented by homonyms, polysemic phrases, and 
anaphoric references. The interlingua approach is favored because 
25 its representation of text meaning within a context larger than 
single sentences can, in theory, greatly reduce ambiguity in the 
analysis of input texts. Also, once the input text has been 
decompiled into a semantic structure, it can theoretically be 
translated into multiple target languages using the linguistic and 
30 semantic rules developed for each target language. In practice, 
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however, the interlingua approach has proven difficult to implement 
because it requires the development of a universal symbolic language 
for representing "meaning" and comprehensive knowledge bases for 
making the conversions from source to intermediate and then to target 
5 languages. Examples of interlingua systems include the Distributed 
Translation Language (DLT) undertaken in Utrecht, the Netherlands, 
and the Knowledge -Based Machine Translation (KBMT) system of the 
Center for Machine Translation at Carnegie-Mellon University. 



10 Other machine translation systems have been developed or 

W are under development using modifications or hybrids of the transfer 
U and interlingua approaches. For example, some systems use human pre- 
fij editing and/or post -editing to reduce text ambiguity and improve the 
U correctness of word and phrase selections. Other systems attempt to 
I:i5 combine a basic transfer approach with knowledge bases and artificial 
ji^ intelligence techniques for machine editing and enhancement. Another 
^ approach is to combine decompilation to a syntactically-based 
Q intermediate structure with transfer to equivalent output phrases and 
^ sentences. For a further discussion of current developments in 
20 machine translation, reference is made to Machine Translation. 
Theoretical and Methodological Issues , edited by Sergei Nirenberg, 
published by Cambridge University Press, 1987, and "Proceedings of 
The Third International Conference on Theoretical and Methodological 
Issues in Machine Translation of Natural Language", published by 
25 Linguistics Research Center, University of Texas at Austin, June 
1990 . 



It is expected that machine translation (MT) systems will 
develop in time to provide higher levels of translation accuracy and 
3 0 utility. However, current MT techniques using a basic transfer 
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approach can produce acceptable translation accuracy in a selected 
sublanguage, yet they are not in widespread use. One reason for the 
limited use of MT systems is that most current systems are designed 
for a single, specific application, environment and language pair 
5 context. The requirements of that context motivate the design and 
development of the grammar, dictionary structure, and parsing 
algorithms. Thus, the utility of the system becomes confined to that 
particular context. This approach greatly limits the range of 
applications and the audience of users which can be productively 
10 served by such application- and language -specific MT systems. 



\ n Summary of Invention 

l A5 Jt is therefore a principal object of the present invention 

O to P rovide a system which can perform machine translation among a 
■■4 plurality of source languages, target languages, and sublanguages, 
Q and automatically send the translated text via telecommunications 
If, links to one or more recipients in different languages and/or in 
20 different locations. The system should be capable of providing 
acceptable levels of translation accuracy and be readily upgradable 
to higher levels of accuracy and utility. It is a further object 
that such a system be capable of operation with a minimum of human 
intervention, yet have interactive utilities for obtaining and adding 
25 new word entries to its dictionary database. It is also desired that 
such a system be capable of building and organizing a large-scale 
dictionary database containing core language dictionaries, plural 
sublanguage dictionaries, and individual user dictionaries in a 
manner which cumulates and evolves over time. 

30 
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In accordance with a principal aspect of the present 
invention, a machine translation and telecommunications system 
comprises : 

(a) a machine translation module which is capable of 
5 performing machine translation from input text of a source language 
to output text of a target language; 

(b) a receiving interface for receiving input via a first 
telecommunications link, said input including an input text to be 
translated accompanied by a control portion having at least a first 
10 predefined field therein for designating an address of a recipient to 
^ which translated output text is to be sent; 

( c ) a recognition module coupled to said receiving 
hi interface for electronically scanning the control portion and 
M recognizing the address of the recipient designated in the first 
i£L5 predefined field of the control portion; and 

i'3 (d) an output module including a sending interface for 

H sending translated output text generated by said machine translation 
Q module to the address of the recipient recognized by said recognition 
module via a second telecommunications link. 

"20 

In a more specific aspect of the invention relating to 
[sublanguage] source/t arget language selection, a machine translation 
system comprises: 

(a) a receiving interface for receiving an input text in a 
25 source language accompanied bv [and a sublanguage] a control input 
including at least a first predefined field containing an address of 
a receipient to receive outpu t text in a target language and a second 
predefined field conta ining a source/target language control input 
indicative of a selected [sublanguage] source /target language pair 
30 for translation applicable to the input text from among a plurality 
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of possible [sublanguages] source /target language pairs : 

(b) a machine translation module capable of performing 
machine translation of an input text in a source language to an 
output text in a target language using a dictionary database 

5 containing entries for words of the target language corresponding to 
words of the source language; 

(c) a dictionary database containing a plurality of 
source/target l anguage dictionaries, each [including a core language 
dictionary] containing entries for generic words of [the] a source 

10 and target language [s] pair [, and a plurality of sublanguage 
dictionaries each containing entries for specialized words of a 
I'* respective sublanguage] ; 

fij (d) a dictionary control module responsive to the 

[sublanguage] source / target language control input for selecting a 
1=45 [sublanguage] source/ target language dictionary of the dictionary 
O database which is applicable to the input text, and for causing the 
'■■^ machine translation module to use the selected [sublanguage] 
Hj source/target l anguage dictionary in performing translation of the 
input text; and 

20 (e) an output module responsive to the address of the 

control input for outputting translated text in the target language 
generated by the machine translation module and automatically routing 
it to be sent to the recipient's address . 

25 In another aspect of the present invention, [the] a 

sublanguage control input causes a selected sublanguage dictionary 
deemed especially applicable to the input text to be used in order to 
perform more accurate translation of the input text. The dictionary 
database includes core and sublanguage dictionaries for a plurality 

30 of source/target languages and sublanguages. The machine translation 
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system with this multiple core languages and sublanguages capability 
is employed in a telecommunications system which automatically 
translates and transmits text from [a] one or more senders to one or 
more recipients in [other] different languages. A cover page or 
5 header accompanying the input text is used to designate the selected 
source/target languages, the applicable sublanguages, and the 
address (es) electronic, fax, or mail -- of the recipient (s) . 

In a preferred embodiment, the receiving interface receives 
10 input text as electronic (machine -readable) text over a 
^ communications line, or as page image data via a fax/modem board or 
M : page scanner. The receiving interface is operated in a computer 
jj'y server along with a recognition module for converting any page image 
data to electronic text. The recognition module scans and recognizes 
;<45 designations of the cover page or header accompanying the input text 
p for determining the selections of the source/target languages and 
sublanguage (s) applicable to the input text. In the case of 
Sd electronic text, the cover page and the input text may be introduced 
^ by means of a disk file, by downloading an electronic file, or by 
20 online user-system interaction. An optional online interaction mode 
can prompt the user for information concerning the user's identity, 
sublanguage preferences, and/or a particular input text to be 
translated in order to facilitate generation of a suitable cover 
page. Inferencing algorithms may be used to assess the user and 

2 5 cover page information and determine the applicable sublanguage 

dictionary (ies) . 

The output module may have a page formatting program for 
composing the translated output text into a desired page format 

3 0 appropriate to a particular recipient or target language. It may 
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also have a footnoting function for providing footnotes of ambiguous 
phrases of the input text in their original source language and/or 
with alternate translations in the target language. The output 
module includes a sending interface coupled to a fax/modem board for 

5 facsimile transmission, or a printer for printing output pages, or a 
telecommunications interface for sending output electronic text to a 
recipient's electronic address. The modularity of the receiving 
interface, dictionary database, dictionary control module, and output 
module from the machine translation module assures that, as machine 

0 translation improvements are developed, the machine translation 
module may be upgraded or replaced without rendering the other 
portions of the system dysfunctional or obsolete. 

As another aspect of the invention related to a machine 
5 dictionary database, a machine translation system comprises: 

(a) a machine translation module capable of performing 
machine translation of input text in a source language to output text 
in a target language using a dictionary database containing entries 
for words of the target language corresponding to words of the source 

0 language; 

(b) a dictionary database including a core language 
dictionary containing entries for generic words of the source/target 
languages, a plurality of sublanguage dictionaries each containing 
entries for specialized words of a respective sublanguage used by a 

5 group of users, and a plurality of user dictionaries each containing 
entries for individualized words of a respective user; and 

(c) a dictionary control module responsive to control 
inputs to the machine translation system for causing the machine 
translation module to use the core language dictionary, any 

0 applicable sublanguage dictionary, and any applicable user dictionary 
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for performing translation of an input text attributed to a user of 
the system. 

In this aspect of the invention, a large-scale dictionary 
5 database is maintained which has dictionaries containing word entries 

specified linguistically at different hierarchical levels of usage. 

At the lowest (user) level, a particular user can enter temporary or 

"scratch" word entries into a respective user dictionary. The 

machine translation system uses the particular user's dictionary to 
10 perform machine translation of text which may contain idiosyncratic 
J ; f| or new words or phrases particularly used by that user. The 
M : dictionary control module includes dictionary maintenance utilities 
j=yi which allow such scratch entries to be entered by users into their 
Jff user dictionaries, and which assist a dictionary maintenance operator 

(DM0) to review the scratch entries so that they can be confirmed as 
*=2 valid dictionary entries for machine translation. The dictionary 

maintenance utilities include automated programmed procedures for 
■;i assessing whether particular word entries appearing in the user 

dictionaries can be moved into a higher- level sublanguage dictionary 
20 for a given domain or group of users. In a similar manner, word 

entries that appear in common in the sublanguage dictionaries of a 

wide range of domains or user groups can be moved as generic word 

entries into the core language dictionary. 

2 5 Other objects, features, and advantages of the present 

invention will become apparent from the following detailed 
description of the preferred embodiments of the invention, as 
considered with reference to the following drawings: 

30 
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Brief Description of Drawings 

Fig. 1 is a schematic diagram of a machine translation and 
telecommunications system in accordance with the invention. 

5 

Fig. 1A is a schematic diagram of a computer server which 
includes a receiving interface, recognition module, and dictionary 
control module, and is coupled to a machine translation module and an 
output module. 

10 

Fig. IB is a schematic diagram of a machine translation 
U module which includes a translation processing module and a 
jiijj dictionary database, and its linkage to the computer server and the 
output module. 

5-45 

O Fig. 1C is a schematic diagram of the output module, 

including a page formatting module and a sending interface. 

^ Fig. 2 is an illustration of a cover page for designating 

20 core language pair, sublanguage (s) , and recipient information, and 
accompanying text pages . 

Fig. 3 is an illustration of input ideographic text and 
output English text as performed by the machine translation system 
25 using page formatting functions. 

Fig. 4 is a schematic diagram of the dictionary control 
module, comprising a dictionary selection submodule and a dictionary 
maintenance submodule, wherein the latter includes an (interactive) 
3 0 user maintenance module and a dictionary maintenance module as 
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overseen by a dictionary maintenance operator (DMO) . 

Fig. 5 is a schematic representation of an interactive 
input editor for interactions with users of the system. 

5 

Fig. 6 is a schematic diagram illustrating dictionary 
maintenance utilities for collapsing and promotion of entries from 
subordinate to superordinate dictionaries. 

10 Fig. 7A illustrates, as a function of the dictionary 

M maintenance utilities, the creation of scratch word entry from an 
M ; identical word entry. 

Fig. 7B illustrates the use of utilities with an 
1:45 interactive input editor to scan various levels of the dictionary 
m hierarchy for word entries on which to base scratch word entries. 

^ Fig. 7C illustrates a typical content of an identical word 

^; entry from which a scratch word entry is created. 
20 

Fig. 7D illustrates the creation of a "copy-cat" word entry 
from a synonymous word entry. 

25 Detailed Description of Preferred Embodiments 

Referring to Fig. 1, a preferred form of the machine 
translation and telecommunications system in accordance with the 
present invention comprises a computer server 10, a machine 
30 translation module 20, and an output module 30. (These and further- 
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described components of the system will be denoted with capital 
letters for clarity of reference.) The Computer Server 10 receives 
electronic text input accompanied by a cover page or header from any 
of a plurality of input sources, designated generally as a 
5 telecommunications link A. The Computer Server 10 has a function for 
recognizing control data in the cover page or header designating core 
language and sublanguage selections applicable to the input text to 
be translated. It also recognizes output addresses and page 
formatting data to be used by the Output Module 3 0 for transmitting 
10 the translated text to the designated recipient (s) via any of a 
^f? plurality of output devices, designated generally as a 
M< telecommunications link B. Due to the modularity of the system, the 
j=y Machine Translation Module 2 0 may be updated by operator maintenance 
or upgraded or replaced without rendering the other functions of the 
i'i.5 system dysfunctional or obsolete. 

^ The Machine Translation Module 20 is capable of performing 

"4 machine translation from input text in a source language to output 
m text in a tar 9 et language. In the examples of a machine translation 
20 (MT) system described herein, reference is made generally to an MT 
system of the transfer type which relies upon the use of a machine- 
readable dictionary for lookup of source/target word entries. The 
principles of the present invention may also be applied to an MT 
system of the interlingua type. Transfer-type MT systems are more 
2 5 widely accepted for near- term usage than interlingua systems, and 
they rely more heavily on linguistic knowledge incorporated into 
machine dictionaries designed for source/target language pairs. The 
operation of transfer- type MT systems is well understood by those 
skilled in the machine translation field, and is not described in 
30 detail herein. 
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Input Data Reception and Extraction 

Fig. 1A shows the Computer Server 10 having a Receiving 
5 Interface 11 linked to the telecommunications link A, a Recognition 
Module 12, and a Dictionary Control Module 13. The Receiving 
Interface 11 may include an interactive mode program (to be described 
further herein) whereby a user can provide cover page or header 
designations, update or create User ID files pertinent to translation 
10 parameters associated with that user's communications, or create 
*f specialized user dictionary entries during interactive text entry 
M sessions. The Recognition Module 12 includes a character recognition 
f\l (often referred to as "OCR") program which recognizes and converts 
!;i P&ge image data into machine -readable text, and which recognizes 
|?45 cover page designations or user designations referencing cover page 
ji« data stored in the User ID files. The Dictionary Control Module 13 
^ includes a Dictionary Selection Module, which assesses the control 
data it receives from the Recognition Module 12 and designates the 
i-J appropriate core language and sublanguage dictionary (ies) to be used 
20 by the Machine Translation Module 20. It also includes a Dictionary 
Maintenance Module, which allows a dictionary maintenance operator 
(DM0) to create and update dictionary entries in the Dictionary 
Database 22 . 

25 Using the control data from a cover page or header 

accompanying the input text, the Computer Server 10 allows the system 
to automatically recognize a sender's designations of the source 
language of the input text, the target language (s) of the output 
text, any particular sublanguage (s) used in a specialized domain, 

3 0 user group, or correspondence type, any preferred page format for the 
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output text, and the address (es) of one or more recipients to whom 
the translated text is to be sent. Thus, the system can 
automatically access designated ones of the plurality of core and 
sublanguage dictionaries maintained in the Dictionary Database 22 for 
5 different source/target languages and sublanguages, and 'can format 
and transmit the translated text to recipient (s) in respective target 
language (s) via telecommunications link B, without the need for any 
substantial human intervention. 

10 The Computer Server 10 interfaces with a plurality of 

receiving devices. For example, input data can be received as a 
M= facsimile transmission via a fax/modem board plugged into the I/O bus 
ryi for the server system. Such fax/modem boards are widely available 
|J and their operation in a server system is well understood by those 
i>45 skilled in this field. Input may also be received from a 

Si 

1=3 conventional facsimile machine coupled to a telephone line which 
^ prints facsimile pages converted from signals transmitted on the 
-=J telephone line. A conventional page scanner with a sheet, feeder can 
be used to scan in facsimile or printed pages as page image data for 

2 0 input to the Computer Server. The page image data is then converted 

to machine -readable form by the OCR program. Input may also be 
received through a telecommunications program or network interface as 
electronic text or text files (such as ASCII text) , in which case 
conversion by the OCR program is not required. 

25 

The OCR program is preferably resident as an application 
program in the Computer Server 10 along with the interface programs 
for handling the reception of input data. OCR programs are widely 
available, and their operation is well known in this field. For 

3 0 example, an OCR program for scanning and recognizing Japanese kana 
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and ideographic characters is offered by Catena Corp., Tokyo, Japan. 
An example of an OCR program for alphanumeric characters is 
WordScanTM offered by Calera Recognition Systems, Santa Clara, 
California. The Computer Server 10 is preferably a high-speed, 
5 multi-tasking PC computer or workstation. 

Referring to Fig. 2, the Computer Server 10 receives input 
data which is divided into two parts: a cover page or header 50 and 
input text 60. In the example shown, a cover page is used in 
10 conjunction with other pages of input text in a page-oriented system. 
In the case of transmission of an electronic text file or a text 
{..*: message, a preceeding header or identifier for the communication is 
m used. The cover page 50 has a number of fields for designating 
s !;f selections of source/target language (s) , sublanguage (s) , page format, 
i'45 and recipient (s) for the text. The cover page 50 is organized with 
data fields in a predefined format which is readily recognized by the 
^ Recognition Module 12 of the Computer Server 10 so that the control 
y data in the predefined fields can be readily recognized. 

2 0 For example, the cover page 50 may be laid out and 

formatted with field boundaries and markings on the printed page 
which can be optically scanned with a high level of reliability. 
Line dividers 51 and large type-size headers 52 may be used to mark 
the sender, source/target language (s), sublanguage (communication 

25 type or subject matter), page format, and recipient address fields. 
Boxes 53, which can be marked or blackened in, allow the designated 
selections to be determined without error. The names of the sender 
and recipients, their respective companies, addresses, and telephone 
and/or facsimile transmission numbers are determined by character 

30 recognition once the respective fields 51, 52 have been 
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distinguished. Any page length of input text 60 can follow the cover 
page 50. Alternatively, information ordinarily supplied by a cover 
page or header may be stored in the User ID files and supplied 
automatically as a memorized script in response to user selection. 

5 

It is the task of the Recognition Module 12 to extract data 
pertinent to dictionary selection from the fields of the cover page 
or header. In batch mode this data is predetermined -- it is either 
filled into the cover page fields by the user with each specific 
10 translation transaction, or it can be supplied by a reference to the 
i User Identification (ID) files resident in the Recognition Module 12. 



f'ij In the Interactive Mode for specifying the cover page or 

header through the Receiving Interface 11, the user may first be 

!*i5 presented with predetermined sets of fill-in data and then prompted 
for alternative values, or provided with a variety of alternatives 

^ from which to choose, based upon a variety of sets of data already 

*<Jj stored in the User ID files, or based upon inferences drawn from the 
data as it is entered by the user. For example, a User A may specify 

2 0 Recipient Z by name only, and then be presented with additional data, 

such as Recipient Z's address, title, or affiliation, already stored 
in the User ID files for verification or correction. Alternatively, 
Recipient Z may never have been addressed by User A in the past but 
may be a user categorized in Domain L, which is a domain of which 
25 User A is also a member, thus triggering the inference that the 
sublanguage dictionary of Domain L may be presented to User A as an 
option for use. 

The user may be prompted in Interactive Mode to verify or 

3 0 choose among field values which may aid in selecting one or more 
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sublanguage dictionaries for a given translation, including 
correspondence types, subject domains, social indicators, etc. By 
automating the filling-in of cover page information, the system 
maximizes the flexibility and accessibility of the system's 
5 capabilities for the user while controlling and monitoring the 
completeness and cohesiveness of the data supplied. 

The cover page may designate a plurality of recipients in 
a plurality of address locations and target languages, each of which 

10 may have particular formatting requirements for the output. For 

^ automated assistance, each prospective recipient can be referenced by 

H' ; an identifying code indexed to data stored in the User ID files. For 

h\i example, a travel agent may have a regular set of clients in a 

!:!? variety of locations and languages, with access to a variety of 

S=i5 communication modes, to whom he or she regularly sends advertising 

^ material. One client may require Japanese translation formatted as 

^ "right- to-lef t ,! vertical lines of ideographic characters, to be 

%i printed and sent as ordinary mail. Another may require faxed 

J;^ translation into German. Still another may have E-mail capability 

2 0 and require a printed copy as well. All these combinations of 

addressees and requirements can be predefined and stored in the User 
ID files. The sets of data to be supplied to the cover page fields 
for each of these addressees may be indexed to mnemonic codes, such 
as the addressee's alphabetic name, which are supplied by the user, 
25 so that they can be retrieved from the User ID fi^es by the 
Recognition Module. 

The User ID files may be established at the time of 
subscription by a user to a machine translation service, and updated 

3 0 from time to time thereafter. Using the Interactive Mode, the user 
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may be prompted to supply his or her name, sex, title, company, 
address, group affiliations, source language, etc., as well as data 
relevant to prospective recipients or groups of recipients to be 
stored in the User ID files for filling in cover pages automatically. 
5 Sublanguage selections appropriate to the user may be identified or 
queried by comparing the requirements of the user with those of other 
users subscribing to the service . 

The user may also be prompted to provide samples of typical 
10 texts expected to be submitted for translation, as well as 
individualized or key words for a thesaurus of terms relevant to that 
M 1 user and/or chosen sublanguages. Automatic utilities may be employed 
m to make inferences about the choice of sublanguage dictionaries most 
|;i appropriate to the new user, based in part on a thesaurus of relevant 
fii.5 terms supplied by individual users and groups of users in the same 
subject domain. The recognition of the user's membership in one or 
more established groups or subject domains is an important part of 
vjj dictionary selection and maintenance. 

2 0 At the time of each translation transaction, the user may 

be prompted by the system to tailor the cover page to the specific 
translation transaction about to be initiated. The system may ask 
the user to confirm a default cover page configuration, to select or 
modify previously established cover page configurations, or to fill 

2 5 in a scratch cover page which may be blank or partially filled in 
with data from the User ID files. 

Machine Translation Using Sublanguage Dictionaries 

30 
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As shown in Fig. IB, the Machine Translation Module 20 is 
comprised of a Translation Processing Module 21 , and a Dictionary 
Database 22 containing dictionaries for a number of core 
source/target languages I, II, III, IV, etc., each of which may 
5 contain a plurality of domain, subdomain, and user dictionaries. The 
Translation Processing Module 21 may be a conventional transfer- type 
system, such as the ECS Natural Language Processing System 
(hereinafter "ECS/MT system") offered by Executive Communication 
Systems, Inc., Provo, Utah. The selection indices for the core 
10 language and sublanguage dictionaries provided by the Dictionary 
^ Selection Module determine which dictionaries in the Dictionary 
H Database are used. The selected dictionaries may be compiled 
jnj together as one operating dictionary, or prioritized and arranged 
hierarchically in the system 1 s RAM memory. 

Ik5 

The Machine Translation Module is shown as a separate 
^ module which receives system data designating the core language and 
% 'j sublanguage (s) , if any, to be used and the input text from the 
J;^; Computer Server 10 via the Dictionary Control Module 13. In this 
2 0 manner, the machine translation functions are kept separate from the 
receiving interface, recognition, User ID files, dictionary 
selection, dictionary maintenance, and other functions of the 
Computer Server 10, so that they can be easily upgraded and/or 
replaced with enhanced programs without disruption to the remainder 
25 of the system. The Computer Server 10 acts as a control unit for the 
Machine Translation Module 2 0 by exploiting the functions of the 
Dictionary Control Module 13 for selecting the core language and 
sublanguage (s) to be used in accordance with the system control data 
extracted by the Recognition Module 12. 

30 
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The Computer Server, Machine Translation Module, and Output 
Module may all reside together on the same workstation. A current 
target for machine translation systems is a speed of about 20,000 to 
30,000 words/hour. A workstation using currently available transfer- 
5 type translation programs can attain this range with a processor 
speed of about 50 to 100 MIPs (million instructions per seconds) . 
Substantial savings on disk access times can be obtained by providing 
a RAM capacity sufficient to hold all selected core and sublanguage 
dictionaries in internal memory. For a typical core dictionary size 
10 of 60K entries (100 bytes each) for each of the source, transfer, and 
? « target lexicons, plus three sublanguage dictionaries of 5K x 3 
H entries each, as well as system program and operations files, a RAM 
jpy capacity of the order of 4 8 MB of internal memory or more is 
■;f desireable. 
H5 

ji-j Alternatively, the system may be implemented with separate 

processing units. For example, the Computer Server and Output Module 
M may be implemented as a telecommunications workstation, while the 

Machine Translation Module may be implemented via a RISC 'processor , 
2 0 parallel processors, or a supercomputer for high-speed batch 

processing of multiple source/target language, sublanguages, and 

output formats . 

Machine translation is generally performed by passing each 
25 sentence of the text to be translated through a series of stages. 
Typically, these stages include: (a) source text dictionary lookup 
and morphological analysis ; (b) identification of homographs; (c) 
identification of compound nouns; (d) processing of prepositions; (e) 
identification of nouns and verb phrases; (f) sub j ect -predicate 
30 identification; (g) syntactic ambiguity identification; (h) 
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processing of idioms; (i) mapping of source structures onto target 
structures; (j) synthesis and morphological processing of target 
text; and (k) rearrangement of words and phrases in target text. 

5 As an example, the ECS/MT System is a transfer- type system 

based on Lexical Functional Grammar theory and constructs. An input 
sentence is parsed word by word in left- to-right fashion. Each word 
is searched by lookup in the source dictionary to determine its 
morphological, lexical, and syntactic attributes. In the ECS 
10 implementation of Lexical Functional Grammar, the indexed attributes 
of words are used to call analysis routines or invoke grammar rules 
m which enable recognition of the word's place and function within a 
si: i phrase component of the sentence. Decisions based upon the analysis 
IJ rules and analysis process assist in disambiguating the lexical 
ij ; 15 meaning and phrase structure of the input sentence in the source 
language. 

Q The result of parsing in the analysis phase is an 

^ intermediary graph or table representing the source -language phrase 

2 0 structure of the sentence, mapped to a directed acyclic graph 
displaying the grammatical function of words within the sentence and 
their lexical attributes. This Lexical Functional Grammar 

representation is largely language independent. During the transfer 
phase, the functional structure representation of the source -language 

25 sentence is transferred by lexical and syntactic transfer rules into 
an equivalent target -language representation of functional structure 
and lexical attributes. This target -language representation is then 
synthesized into an output sentence using the lexical data and 
grammar rules provided by the target language dictionary. A core 

30 language dictionary, including source and target word entries, 
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bilingual transfer entries, and morphological, syntactic, and lexical 
rules for both source and target languages, is required for each 
language pair. 

5 In the present invention, the Translation Processing Module 

21 also uses a selected sublanguage dictionary containing specialized 
word entries and grammar rules specific to a sublanguage that is 
particularly applicable to the input text. Each sublanguage 
dictionary set up in the Dictionary Database 22 is chosen to have a 
10 manageable size, predictable modes of expression and syntactic 
^ structures, and a well -understood context for disambiguation of 
H homonyms, polysemic phrases, and specialized references. 

j;:f In the machine translation field, the term "sublanguage" 

l%5 usually refers to a recognized domain having a defined set of terms 

and patterns of language usage that characterize that domain. In the 

present invention, "sublanguage" is used more loosely to refer to any 
: y set of terms and patterns of usage attributed to a field of usage, 

group of users, or even an individual user. That is, a "sublanguage 
20 dictionary" is set up on a fluid or ad hoc basis whenever a preferred 

set of terms and usages is identified. 

As illustrated in Fig. 2, for example, designated 
sublanguages might include correspondence types, such as business 
25 letters, legal/technical analysis, technical writing, 
financial/market reports, or general writing. Business 
correspondence typically employs only a few pages, a limited 
vocabulary (on the order of 600 0 words) , and a limited set of 
syntactic structures (often restricted to declarative sentences) . 

30 
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Designated sublanguages may also encompass specific fields, 
e.g., technical fields such as physics, chemistry, electronics, 
military, etc., or commercial fields such as travel and tourism, real 
estate, finance, shipping, insurance, etc., or groups of users such 
5 as associations, corporations, departments, or simply persons in 
regular communication with each other. 

Sublanguage dictionaries may also be set up corresponding 
to socially-determined usages or particular contexts in which certain 

0 communications take place. For example, in some languages, such as 
Japanese, certain words, forms of address, and even whole expressions 
are determined by the relative age, sex, position, grouping 
(internal/external) , or environment of the speaker and the person 
being addressed. Such particular terms and usages can be set up as 

5 distinct sublanguage dictionaries that are accessed according to 
factors identified in the cover page or header for a communication, 
e.g., status or sex- indicative titles of the sender and recipient, 
positions in their respective companies, locations of the sender and 
recipient , etc . 

0 

Setting up sublanguage dictionaries can be implemented with 
dictionary-building tools currently used in machine translation 
systems. For example, the ECS/MT system provides a set of tools to 
develop a core dictionary for a chosen language pair and a technical 

5 dictionary for a chosen sublanguage. A Rule Editor tool allows a 
linguist to create and modify morphological rules, phrase structure 
rules , and transfer rules for the sublanguage . A Dictionary 
Maintenance Utility allows creation and modification of lexical 
entries, including source entries, target entries, and source- to- 

0 target transfer entries in the dictionary. A Translation Module 
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performs table-driven translation using linguistic tables, analysis 
rules, transfer rules, and semantic preference entries that have been 
compiled into the dictionary. A Morphology Module applies rules to 
analyze morphologically complex words to determine uninflected forms 
5 for dictionary lookup of source lexical items and to generate 
morphologically complex words in the target language. A Semantic 
Preference Component provides for the identification of semantic 
relations, the assignment of semantic attributes to lexical items, 
and the accessibility and matching of these attributes for lexical 
10 disambiguation and selection of preferred translations. 

I'jj Dictionary Organization and Selection 

I'i5 In the present invention, core language dictionaries and a 

tit plurality of ad hoc sublanguage dictionaries (including both lexical 
^ entries and grammar rules for morphological and syntactic analysis 
and generation) are maintained in the system's Dictionary Database 
■;f 22. The core language dictionaries are developed and maintained 
20 according to linguistic methods and tools commonly used in the 
machine translation field. In the present invention, sublanguage 
dictionaries are set up for any identified commercial or technical 
fields, application domains, groups of users, and even individual 
users. No particular effort is made to rigorously identify 
25 sublanguage boundaries or general sublanguage patterns. Instead, a 
sublanguage dictionary is set up or updated anytime the vocabulary or 
syntactic preferences of a user or group of users can be identified. 
Individual user or lower level dictionaries may be combined or 
integrated into master (higher- level) sublanguage dictionaries for 
3 0 any field, application domain, or user group when more general 
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sublanguage vocabulary or syntactic preferences are identified. 

The Dictionary Database embodies at once a hierarchical 
structure of nested dictionaries, arranged in order of generality of 
5 usage and exploiting inheritance of linguistic attributes within 
entries, and a relational structure, whereby the various dictionaries 
and the particular entries within them inform the establishment of 
subject domains and the sublanguage dictionaries pertinent to them. 
The Dictionary Database includes a core language dictionary which 
10 contains entries for words in their most general usages, as well as 
%z z a set of grammar rules for analyzing and generating their 
H morphological and syntactic structures. In the transfer approach, 
l : y the core dictionary must contain three parallel entries for each term 
j;f to be translated, i.e., two monolingual entries, one for the source 
!#5 language and one for the target language, containing information 
^ about the morphological, syntactic, and semantic characteristics of 
jlf the word in relation to its own language, and a bilingual transfer 
entry specifying details required to translate the source word into 
the target, including information on whatever structural changes must 
20 be made during the translation process. The monolingual entries may 
be usable in a monolingual dictionary for another source/target 
language pair, however, the bilingual (transfer) entries are specific 
to the language pair involved. 

25 As illustrated in Fig. IB, the Dictionary Database of the 

present invention allows for a multiplicity of levels . of nested 
sublanguage dictionaries along with the core language dictionary. At 
the lowest level, user dictionaries may exist for individual users. 
The user dictionaries are nested within higher-level "subdomain" or 

3 0 "master" dictionaries Subl, Sub2 , etc. The subdomain dictionaries 
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contain more general word entries and grammar/linguistic rules that 
are common to the users grouped within or cross-referenced to that 
subdomain. The subdomain dictionaries are nested within higher- level 
"domain" dictionaries Doml, Dom2 , etc. The domain dictionaries 
5 contain even more general word entries and grammar/linguistic rules 
that are common to the subdomains grouped within or cross-referenced 
to that domain. At the highest level, the domain dictionaries are 
nested within the core dictionary which contains common words and 
rules that are generic to most or all of the included domains. The 
10 sublanguage dictionary entries have the same general structure as the 
^ core dictionary entries. Thus, in the transfer approach, the two 
jU monolingual and third transfer entries for each input word must be 
j'lj available in the sublanguage dictionary. 

1*5 This hierarchical organization of dictionaries provides for 

jjU minimum dictionary lookup time in the translation of a sublanguage- 
3 specific text, because it directs the lookup first to the user 
■ij dictionaries, then to the sublanguage dictionaries. When the lower 
Jif level dictionaries are searched first, a more accurate, efficient, 

2 0 and idiomatic translation is likely to be obtained than the broad- 

based and more general core dictionary could provide. If a lower 
level dictionary cannot analyze and resolve an input item, the 
Dictionary Control Module 13 then accesses a next level dictionary 
and finally the core dictionary upon failure of translation at a more 
25 specific level. The dictionaries are selected for domain specificity 
by the Dictionary Selection Module, but the progression of access is 
inherent in the nested structure. 

Another aspect of the Dictionary Database lies in its 

3 0 treatment of linguistic information. The generativity of human 
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grammar relies on the capability of organizing linguistic data into 
types, such as Noun, Verb, Adjective, etc., and subtypes, such as 
Transitive Verb, Intransitive Verb, etc., i.e., into a a hierarchy of 
types. It is possible to capitalize on this data classification 
5 computationally. In a preferred embodiment of the present invention, 
linguistic data may be specified as objects using well-known object 
oriented programming techniques. These techniques allow messages to 
be passed among objects regarding the manipulation of data pertinent 
to them and allow objects to inherit characteristics general to an 

10 overriding class of objects of which they are a member. The 

l i sublanguage dictionary organization allows particular features and 
processes to be indexed to entries in sublanguage dictionaries 

hi independent of any core dictionary entry for the same .word, thus 
allowing the sublanguage use of that word to be domain- specif ic . 

ihl5 Linguistic features and processes common to entries on several levels 
of the hierarchy can be used at all levels, thus streamlining 

^ maintenance of the linguistic analysis. 



f In its relational aspect, the Dictionary Database can allow 

20 access to and comparison of dictionaries on parallel levels in the 
hierarchy. Identical entries from the user dictionaries of several 
users in the same domain or group can be merged and "promoted" from 
user dictionaries to a higher-level dictionary upon application and 
satisfaction of certain reliability criteria. Utilities (described 
25 below) available to the Dictionary Maintenance Operator (DMO) allow 
investigation of relations between entries in user, subdomain, and 
domain dictionaries for the purpose of detecting and correcting 
overlapping or conflicting entries. 

3 0 As illustrated in Fig. 4, the Dictionary Control Module 13 
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has two primary functions, Dictionary Selection and Dictionary 
Maintenance. In its Dictionary Selection capacity 13a, it determines 
the selection of core and sublanguage dictionaries to be used by the 
Machine Translation Module 20, based on the control data provided by 
5 the Recognition Module 12. The Dictionary Selection function 
includes an Inferencing Engine which assesses cover page or header 
data from the Recognition Module 12 and determines the dictionary 
selections to be supplied to the Machine Translation Module 20. 

Dictionary Selection is exploited at the time of 
translation processing. It applies certain selection and ordering 
algorithms to the cover page or header data it receives from the 
Recognition Module 12 in order to determine the appropriate core and 
sublanguage dictionaries to be used for the translation of a given 
text. For example, the data can contain information as to core 
language pair, subject domain, correspondence type, and social 
indicators ("Mr." or "Mrs.", job titles, etc.), all of which may be 
used by the algorithms of the Inferencing Engine to select applicable 
sublanguage dictionaries. When each sublanguage dictionary is set 
up, specifications as to its usage parameters are indexed in the 
Dictionary Control Module 13 . 

Many different approaches to the sublanguage dictionary 
selection and ordering algorithms may be used depending upon the 
25 level and type of data obtainable from the cover page and other 
aspects of the overall system. For example, at a simplest level, the 
selection algorithms of the Dictionary Control Module 13 can 
designate sublanguage dictionaries which directly correspond to one 
or more variables of the cover page or header, e.g., the sender's 
30 name, the "communication type", the subject matter ("re"), etc. 
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A more developed sublanguage dictionary selection algorithm 
can make Boolean inferences about the context of the input text based 
upon the cover page data. For example, the user's group or relative 
social status can be determined by searching stored group lists or 
5 comparing the titles of the sender and recipient. This is 
particularly important for languages where correct terms of speech 
and address are dependent upon the context of a communication or the 
relative status of the parties. To illustrate, "IF the sender is a 
higher- level employee of a travel agency, AND the recipient is a 
10 lower- level employee of a hotel group, AND the communication is a 
H "travel advisory", THEN USE Sublanguage Dictionary A2 3 for priority. 1 
lexical/grammar entries and Sublanguage Dictionary Z4 for priority. 2 
ry entries, OTHERWISE default to Core entries". 

lis 

jU Dictionary Maintenance 



j The Dictionary Maintenance Module 13b enters, tracks, 

i sorts, indexes, and maintains word entries in the multiplicity of 
20 core and sublanguage dictionaries. The Dictionary Maintenance Module 
13b includes an interactive User Maintenance capability 13b-l with an 
Input Editor for creating temporary "scratch" entries in user 
dictionaries, and a DMO Maintenance capability 13b-2 with a 
programmed Dictionary Maintenance Utility for updating dictionary 
25 entries based on data analyzed and supplied by more general DMO 
Assistance Utilities. 

A user dictionary may be created or initialized at the time 
of subscription by a new user to the machine translation service. In 
3 0 the Interactive Mode, the new user may be prompted to provide samples 
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of typical texts expected to be submitted for translation, as well as 
individualized words for a thesaurus of terms relevant to that user. 
Later, the Input Editor of the User Maintenance Module 13b- 1 may 
prompt a user to provide basic information that contribute to the 
5 maintenance of the user's dictionary. The Input Editor may be 
invoked in a variety of contexts. For example, during a translation 
session, the Input Editor may create scratch entries for * the user's 
dictionary upon encountering unfamiliar words or phrases in the input 
text. During a dictionary-building session (outside of a translation 
10 session) , the Input Editor may create scratch entries from a list 
supplied by the user. During a dictionary-maintenance session 
H (outside of a translation session) , the Input Editor may present the 
m user with the contents of his or her personal dictionary for 
H confirmation and updating. In all three contexts, the Input Editor 
I ; t5 draws upon algorithms designed to maximize the user's knowledge of 
n the relationship of the words entered to one or more domains 
H associated with the user, while requiring a minimum of user knowledge 
Sj of linguistic principles or the structure of the Dictionary Database. 

2 0 The user may also be offered a choice of how elaborate the 

interaction is to be. For example, the user may choose to spend the 
time necessary to answer numerous questions posed by the Input Editor 
about the syntactic and lexical properties of the new word. 
Alternatively, the user may choose an abbreviated option designed to 
25 provide only the linguistic information essential to a rudimentary 
translation of the word. The user may also be given the option of 
not creating a new entry for a particular word but settling for 
offering an acceptable substitute word or expression, or passing the 
source word into the target text untranslated. All these choices of 

3 0 interaction result in the creation of records of the interaction for 
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later examination by the DMO. 

As illustrated in Fig. 5, those interactions that do call 
for creation of a scratch entry may do so by reference to similar 
5 entries or synonymous words already present in the Dictionary 
Database. Upon encountering an unfamiliar word or phrase, the Input 
Editor may ask the user whether the word is a domain- specif ic usage. 
If so, the user is prompted to name an appropriate domain or may be 
p resen ted with a list of domains established in the Dictionary 
10 Database from which to choose. Then the Input Editor prompts the 
H user for a one-word synonym of the unfamiliar word. The sublanguage 
t f * dictionaries in the appropriate domain (s) are searched for an entry 
m for that synonym. If one is present in a related sublanguage 
!;i dictionary, it is imported to the user's dictionary. If not, the 
l ! ±5 user may be prompted for additional synonyms and the process is 
n repeated. If a synonym is not found in a domain- specific dictionary, 
IH the core dictionary may also be searched for the synonymous term. 

U If an appropriate synonym is found, a "copy-cat" entry is 

20 created for the new word in the user's dictionary, using the new word 
as the indexing name of the new entry and the content of the 
synonymous entry as its content. The user may be given the choice of 
using the content "as is", and the word is then translated in the 
manner specified by the content of the "copy-cat" entry. If the user 
25 does not want to use the entry "as is", the Input Editor may prompt 
the user for information regarding the new word's syntactic and 
lexical characteristics, to ascertain whether it is similar in those 
respects to the synonymous word from which the entry content was 
copied. The word may then be translated if the interaction results 
3 0 in an enabling specification of the new word. If not, the scratch 
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entry is maintained for later review by the DMO , and the user can 
choose to offer a substitute word or expression or pass the source 
word into the target text untranslated. 

5 Figs. 7A - 7D illustrate an example of the creation of 

scratch word entries in user dictionaries and their promotion to a 
higher- level subdomain or domain dictionary. Fig. 7A is a schematic 
representation of the creation of an entry for the word "mouse" 
(i.e., a computer peripheral device) in the personal dictionary of 
10 User A, based on a similar entry in the personal dictionary of User 
H B. Both users are members of the sub-domain "Computer". The 
H personal dictionaries of both users are nested within the core 
5 ! y dictionary of the language pair English-to- Japanese . Fig. 7A 
E illustrates this process for a transfer-type machine translation 
l±5 system, however, it can be implemented for an interlingua system in 
P an equivalent manner. 

Q User B's personal dictionary contains the three types of 

li entries necessary for transfer: an English monolingual entry (labeled 
20 "E_word"); an English-to- Japanese bilingual transfer entry (labeled 
"EJ_word"); and a Japanese monolingual entry (labeled "J_word ; 
1 computer mouse'"). These entries contain specifications uniquely 
labeled to refer to the rules and features that are pertinent to the 
word's grammatical functions and linguistic characteristics. 
25 Grammatical functions are specified by frame references, which refer 
to general rules for linguistic types (nouns, verbs, etc.). The 
inclusion of such references invokes the operation of the relevant 
rules by inheritance from files available for use by the MT system as 
a whole. Linguistic features are characteristics unique to 
3 0 individual words, and their values are supplied within the entries 
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themselves . 

For instance, the English monolingual entry illustrated in 
Fig. 7A contains a NounType frame reference (E-NType-1) and three 
5 features (E-NFeatX, etc.). The bilingual entry contains a transfer- 
rule frame reference (EJ-NType-M) and the specification of the 
translation of the word "mouse" (the Japanese translation is 
represented here as **) . The Japanese monolingual entry contains the 
frame reference J-NType-R and three features. The two monolingual 
10 entries contain a group specification, i.e., group = comp (Computer) . 

M< Upon encountering the new word "mouse" in text input to the 

ill system by User A, the Input Editor may interact with the user as 

];= follows: 
11 5 

i'=i Q: is this a domain specific usage? 

•!i Yes - 

'J Q: What is the domain? 

R: Computers. 

20 

With this information, the Input Editor scans the personal 
dictionaries of other users in the "computers" sub-domain and finds 
an entry for "mouse" in the dictionary of User B. The Input Editor 
performs certain checks to ascertain whether this entry is in fact 

25 pertinent to the domain in question, including a search for group 
type, e.g., whether User B's entry for "mouse" contains the 
specification group = comp. The group specification for the word 
"mouse" may be necessary in addition to User B's membership in the 
sub-domain "computer", since User B may be a member of other groups 

30 and domains as well. Additional checks may be performed to determine 
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the relevance of the entry, in the form of further questions posed to 
User A. In this illustration, for example, the Input Editor could 
ask User A whether the word "mouse" is a noun (inferring Noun status 
from the names of the frame references and features within the 
5 entries), whether it had a positive value for NFeatX, etc. 

Upon determination that the entry for "mouse" found in User 
B's dictionary is a suitable match for User A's purposes, entries for 
User A's dictionary are created by making a copy of all relevant 
10 entries from User B's dictionary, and a record of the transaction is 
H made for later perusal by the DMO. The copying process allows the 
P a generic functions of the noun to be inherited by User A's entry 
!ij through the frame references (illustrated by the small box in the top 
center of Fig. 7A) , as well as specifying the featural 
l±5 characteristics unique to the word "mouse" . 

12 If no entries for the word are found in the dictionaries of 

"4 users within the "computer" domain, the Input Editor may scan for 
j=S entries in both lower- and higher- level dictionaries, including the 
20 core dictionary. Fig. 7B illustrates this broader scanning process. 
As the search moves farther from the domain specified by User A, the 
Input Editor can proceed with greater caution in selecting candidate 
source entries for building User A's scratch entries by further 
requiring checks and caveats to the user, and can also include a 
25 notation of the heightened caution in the DMO record. Fig. 7C 
illustrates an example where the same entry is found in another user 
dictionary in another domain. 

If no entries for the same word are found in the available 
3 0 dictionaries, the Input Editor may ask for a synonym for the word. 
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In this example, User A may respond with "pointer" as a synonym for 
"mouse". The system can then scan the various levels of dictionaries 
for entries for the synonym. If an entry for the synonym is found, 
e.g., in User F T s personal dictionary, the Input Editor can pose a 
5 series of questions to User A based upon inferences made from the 
contents of the found entry or entries, as outlined above. If a 
determination is made that User F's synonymous entry is a suitable 
match for User A's purposes, a "copy-cat" entry is created, with the 
entry label of User A's term (e.g., "mouse") and the contents of the 
10 User F's entry relevant to "pointer". Once again, a record of this 
;^ transaction is made for further oversight by the DM0. Fig. 7D 
H illustrates the "copy-cat" entry creation process. 

H The addition of scratch entries through the Input Editor of 

f'±5 the User Maintenance Module 13b- 1 into user dictionaries provides the 
H MT system with a pre-screened corpora of words as to which basic 
^ information on linguistic features and domain (sublanguage) relations 
' ! J have been supplied. Thus, the Dictionary Database can cumulate and 
JiS evolve over time along with actual words and usages encountered in 
20 texts or supplied by users. 

An important feature of the present invention is the 
capability to move word entries from a lower- level (user, 
subdomain/group, or domain) dictionary into a higher- level 

25 (subdomain/group, domain, or core) dictionary, when those entries 
have met certain tests of linguistic completeness and more general 
usage that indicates the desirability of inclusion in a higher- level 
dictionary. A language is in a constant state of evolution as new 
words and usages are adopted by individuals and groups and then gain 

3 0 currency through larger groups and the society as a whole. The 
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movement of entries into more general dictionaries requires review 
and monitoring by a dictionary maintenance operator (DMO) trained in 
linguistics and/or translation of new words and expressions, in order 
to ensure that inaccurate translations or corruptions of the 
5 Dictionary Database do not occur. The present invention provides for 
certain automated utilities in the DMO Maintenace Module 13b-2 which 
assist the DMO in the movement of word entries to higher- level 
dictionaries. If word entries can satisfy certain tests indicating 
that their upward (more general) movement can be done with a high 
10 degree of reliability, the DMO assistance utilities may perform the 
movement of such word entries on a fully automated basis. Such 
M programmed utilities for DMO assistance or fully automated 
rij maintenance open the way for substantially computerized management of 
Si very large dictionary databases, which will enhance the accuracy, 
v±5 performance, and utility of machine translation systems. 

If The manifold tasks of Dictionary Maintenance are 

H fundamental to the operation of the machine translation system 

"X, 

U described herein. The tasks of the DMO are fulfilled, ultimately, by 

2 0 adding new dictionary entries and deleting or modifying already 

existing ones. The choice of which entries are to be added or 
deleted, which aspects of entries are to be modified, and which 
dictionaries and sublanguage dictionaries are to be affected by these 
changes is made by the DMO based upon data on usage and effectiveness 
25 of entries as derived by employing the programmed assistance 
utilities . 

As illustrative examples, the DMO Maintenance Module 13b-2 
may include basic utilities to aid in extracting and organizing word 

3 0 entries from the underlying corpora, such as: conversion of compiled 
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dictionary entries (including grammar rules) to text . files for 
display, for ease of perusal and manual editing; automated creation 
of the records mentioned above and presenting them as text files; 
detection of errors in the structure of dictionary entries and 
presenting faulty entries as text files; extracting and displaying a 
list of all lexical entries by entry name only (a "shelf list" of 
entries in the dictionary) ; extracting and displaying a list of all 
bilingual lexical entries by entry name and translation only 
(including syntactic category for disambiguation), i.e., a bilingual 
"shelf list"; and/or detection of missing links in inheritance 
hierarchies, which would prevent access to a higher- level dictionary 
without such link(s) . 

This set of extraction and display utilities can assist the 
DMO in maintaining a level of consistency and exhaust iveness of 
control in Dictionary Database monitoring. A primary focus of the 
present invention is identifying, creating, maintaining, and using 
sublanguage dictionaries suited to the lexical and linguistic 
idiosyncracies of groups of users. These aspects require the ongoing 
assessment and maintenance of the relations between and among 
sublanguage dictionaries, based upon fluid word usage patterns of 
members of the same domains or groups. 

Maximal efficiency in performing these dictionary-relation 
tasks call for sophisticated utilities to be available to the DMO. 
Such utilities may include the capability to keep track of all 
instances of entries of a new word by users in the same domain. A 
frequency-of -use utility can determine the frequency of use, number 
of entries, and identity of a preferred synonymous entry. A further 
utility can present such data to the DMO for examination for possible 
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"promotion" from user dictionaries to a domain dictionary. A 
sublanguage -selection utility can perform an analysis of the above- 
mentioned records and display patterns of use (e.g., frequency and 
consistency of usage) , indexed to individual users and groups of 
users, to assess the accuracy of the sublanguage dictionary selection 
process. A homograph entry utility can identify and display entries 
of homographs of new entries in user dictionaries in a given domain, 
to display to the DMO for analysis in determining the optimal 
formulation of the homographic entries, and possible collapsing of 
entries into higher level dictionaries. A quality assurance utility 
can display newly created scratch entries for quality assurance 
checks or for promotion to a higher- level dictionary of approved 
entries . 

As aids in maintaining the consistency of the linguistic 
and semantic feature network of the system, utilities may also be 
provided for: finding and displaying entries containing a certain 
feature-value pair, e.g., displaying entries containing the feature 
[+liquid] ; displaying a shelf list of entries containing a certain 
feature-value pair; and displaying the organization of features to 
assist in tracking feature-assignment errors. 

The DMO Assistance Utilities may also employ algorithms to 
perform cross-dictionary comparisons, concordances, integration, 
differentiation, statistical matching, cluster analysis, etc., in 
order to resolve matching, conflicting or overlapping entries in 
different dictionaries. For example, the utilities may be used to 
scan the dictionaries of users from the same domain or group to see 
whether any word entries may be collapsed into a more general word 
entry and "promoted 11 from the user dictionaries to the domain 
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dictionary. An example of such entry promotion is shown , in Fig. 6. 

Criteria for a sufficient level of similarity among entries 
in subordinate dictionaries can be measured using the statistical or 
5 numerical algorithms indicated above. Such measures and others may 
also be employed to determine which characteristics of entries are 
general and thus suited for inclusion in the entries to be created 
for the higher level dictionaries, and which characteristics are 
idiosyncratic to the users whose dictionaries are the source of the 

10 entries. Thus, promotion of entries to higher level dictionaries 

1 1 need not involve their erasure from the user level. The higher level 
H entries are promoted with only the applicable general 
ry characteristics. Idiosyncratic characteristics, if any, may be kept 
!;i in the user dictionaries. Entry promotion can occur between adjacent 
145 levels within the dictionary hierarchy. 



'j The DMO Assistance Utilities may also measure co- 

4 occurrences of words and terms in the input texts of individual users 

1 or groups of users to determine group membership and relations 

2 0 between groups and members, and to infer their characteristics based 

upon characteristics recorded for similar groups or members, or to 
derive general sublanguage patterns for creation of master 
(superordinate) sublanguage dictionaries. Such analyses may also 
yield lists of key words for a thesaurus that is used to select a 
25 sublanguage dictionary appropriate to certain users. 

With information contributed by the DMO Assistance Utilities, 
the DMO can create lists of words and phrases to enter, 'delete, or 
move from one sublanguage to another. Similarly, the DMO can employ 
30 data supplied by feature extraction utilities to arrive at lists of 
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features to alter in the appropriate lists of entries. The DMO may- 
create and alter entries by inputting the data to a text file and 
compiling the file into the Dictionary Database. However, a more 
sophisticated and efficient set of utilities can be provided which 
5 automates the creation and modification of lexical entries, including 
source entries, target entries, and source-to- target entries in a 
dictionary. 

In summary, the Dictionary Control Module allows the overall 
10 machine translation system to possess a very fluid and highly 
^ granular sublanguage capability. The sublanguage capability is 
H developed and cumulated over time based upon the encountered words 
ni and identified preferences of actual users, user groups, domains, or 
J;2 fields. The multiple sublanguage dictionaries are like the listings 
145 of synonyms and alternate phrase usages in a real -world dictionary, 
jjU except that the entries can change along with usage, and the capacity 
^ for domain- specif ic usages is virtually limitless. Computational 
"4 power in the manipulation of the multiplicity of sublanguage 
It dictionaries replaces the need to rigorously define an overall set of 

2 0 sublanguage patterns for a given domain. Horizontal expansion of 

sublanguage capability thus replaces vertical definition. 

Output Formatting and Transmission 

25 

As shown in Fig. 1C, the Output Module includes a Page 
Formatting capability 31 driven by formatting instructions extracted 
from the cover page or header for the translated text by the 
Recognition Module 12, and a Sending Interface 32 which transmits the 

3 0 formatted output text via telecommunications link B to recipients at 
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their addresses as extracted from the cover page and supplied by the 
Recognition Module 12. 

Once the input text has been translated into target 
5 language text, the Page Formatting capability of the Output Module 
3 0 composes the translated text into a desired page format based upon 
the formatting information designated on the cover page. For 
example, for English- to- Japanese translation, the output Japanese 
text may be formatted as "left-to-right " horizontal lines of kana, or 
10 as "right-to-left" vertical lines of ideographic characters. The 
5; 2 page format may also be designated for "page -by-page" translation, 
jU wherein the formatting program takes into account the compression 
Jin ratio between the source and target text. For example, as 
j:2 illustrated in Fig. 3, English text is typically more spatially 
1*5 expansive than ideographic text, so that an 8.5" x 11" input page of 
?*h English text may be reformatted on the same size page with Chinese 
^ characters of suitably larger point size and interline spacings. 
SI Correspondingly, a typical 15.2 cm x 25.6 cm page of ideographic text 
III may be reformatted as an 8.5" x 11" page of English text, or an 8.5" 
20 x 11" or A4-size page may be reformatted as an 8.5" x 14" page. 

The formatting program may also implement a footnoting 
function, as shown in the section "F" in Fig. 3, providing footnotes 
for ambiguous phrases of the input text by replicating their original 

2 5 source language text (indicated, for example, by a single asterisk) 

and/or providing alternate translations in the target language 
(indicated by double asterisks) . The source language phrase and/or 
alternate translation are provided by the Machine Translation Module 
2 0 by flagging an ambiguous word or phrase which could not be 

3 0 resolved in the translation processing. Other well-known page 
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formatting functions, e.g., margins, page layout, columns, 
replication of non-translatable graphic images, etc., may also be 
performed by the Output Module 30. 

5 When the formatted output document is ready for output 

transmission, the Sending Interface 32 becomes operative to generate 
the command signals for controlling the corresponding output devices 
and sending the output document as electronic data signals to the 
respective devices through the telecommunications link B. The output 
10 devices can include a telephone fax/modem board, a printer which may 
also be coupled with a page facsimile transmission machine and/or an 
1,1 automatic mailing machine (for mailing hard copy) , or a network 
I;" interface for sending the output data to a recipient's electronic 
O address on a network. As shown in Fig. 2, the cover page may 
jl5 designate a plurality of recipients in a plurality of target 
S L languages and located at a plurality of addresses. The Sending 
Interface 32 generates and routes the appropriate forms of output 
III; data to each recipient. For example, if each recipient is designated 
Q to receive a fax transmission and a printed copy, the Sending 
H'o Interface routes the data through the fax/modem board to each 
recipient's fax number and also activates the printer and collation 
of pages for automatic mailing. 

25 General Telecommunications Use 

The above-described machine translation system can be 
adapted to and installed as a resident utility or service in 
telecommunications systems or networks, such as private and public 
3 0 networks and gateway companies, telecommunications companies, and bi- 
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or multi -lingual information service providers. The input to the 
telecommunications system is preferably in the form of 'electronic 
text transmissions in the near term. However, with further 
development, input in the form of graphics (facsimile) data and even 
5 speech can be captured, scanned, and converted to intermediary text 
for translation processing and output formatting and transmission in 
any form. As the Dictionary Database develops in depth of coverage 
of particular sublanguages and in breadth of coverage of many 
different domains, the system will acquire the accuracy and 
10 capability to handle communications over a wide range of fields 
U satisfactorily. Mass storage and inexpensive processing power and 
1^ speed can be effectively utilized to handle many different language 
J;™ pairs, technical fields, domains, user groups, and individual users 
P for near-simultaneous translations in a host of languages. 

5; j. 5 

For near term use, the machine translation system is 
particularly suitable for translating electronic text for E-mail, 
T'l elect ronic bulletin boards, and information services in 
u telecommunications networks. As described above, an Interactive Mode 
"20 may be provided through a telecommunications program or a network 
system to interact with an online user inputting text to be 
translated and transmitted on the network. In this mode the user may 
be prompted to fill in cover page fields or create and maintain User 
ID files, to aid in dictionary selection, or update the Dictionary 
25 Database, to enable the system to translate new words encountered in 
the input . 

For electronic text input and output in different 
languages, it is desireable to have a standardized interface to the 
3 0 many different character code conventions used throughout the world. 
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A universal character code convention has been developed by the 
Unicode Consortium, Mountain View, California. The Consortium 
includes IBM, DEC, Apple, and other major American computer 
companies. The Unicode set is a 16-bit character code set that is 
5 mapped to the major character code conventions of the world, 
including the major Roman alphabetic systems and Asian character 
systems. For example, the Unicode set is mapped to the Han character 
sets of the major industry and national standards used in China, 
Japan, Korea, and Taiwan. Thus, a Unicode character converter module 

10 can be employed as the standardized interface for electronic text in 

j a telecommunications system. 



I;;! Text input can also be scanned in from printed pages or 

j|« from transmissions via a fax/modem. The system's Recognition Module 
115 is used to convert such scanned page image data into machine-readable 

text. Currently, off-the-shelf programs are available for English 
^ alphanumerics and Japanese kana. Future developments in character 
y recognition programs for other character sets, such as Chinese Han 
]:~ and Japanese kanji characters, and even handwritten characters, can 
2 0 be expected to further the development of page-oriented translation 

systems . 
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CLAIMS : 

1. A machine translation system for translation of input 
texts sent from a plurality of different users, wherein each of said 
5 users may have a preferred sublanguage of text terminology used in the 
user's input text out of a plurality of possible sublanguages, and 
wherein a perferred sublanguage of a user is determinable from one or 
more parameters of the user's identity, said machine translation 
system comprising: 

10 (a) a receiving interface for receiving a series of 

W translation jobs to be translated in sequence, each translation job 
comprising an input text and accompanying control input including user 
jfjl ID data identifying a user sending the input text, wherein said 
M receiving interface includes means for identifying the input texts and 
f f i5 accompanying control inputs for each one of the series of translation 
S -U jobs and for queueing the input texts for translation in sequence; 

(b) a machine translation module for performing machine 
-J translation of the input texts in sequence by translating each input 
];f text in a source language to an output text in a target language using 

2 0 a dictionary database containing entries for words of the source and 

target languages ; 

(c) a dictionary database including a core dictionary 
containing entries for generic words of the source and target 
languages, and a plurality of sublanguage dictionaries each containing 

25 entries for specialized words of a respective one of a plurality of 
sublanguages handled by said machine translation system for the 
source/ target languages ; 

(d) a recognition module including a memory section for 
storing a plurality of user ID files each of which contains previously 

3 0 stored user sublanguage preference information which is indexed to 
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user ID data for each respective user of said plurality of users, said 
user sublanguage preference information being indicative of a 
sublanguage of text terminology preferred by the respective user for 
translation of an input text from that user, wherein said recognition 

5 module is responsive to the user ID data received by said receiving 
interface to retrieve the user sublanguage preference information 
stored in the user ID file indexed to the user ID data; 

(e) a dictionary control module responsive to the particular 
user sublanguage preference information retrieved by said' recognition 

0 module for selecting a corresponding one of the plurality of 
sublanguage dictionaries of the dictionary database, and for causing 
the machine translation module to use the selected sublanguage 
dictionary along with the core dictionary for translation of the 
particular input text of each respective user; and 

5 (f) an output module for outputting text in the target 

language translated by the machine translation module for each one of 
the input texts, whereby said receiving interface identifies for each 
translation job in sequence the input text to be translated and the 
control input including user ID data identifying a particular user 

0 sending the particular input text and forwards the particular user ID 
data to the recognition module, the recognition module retrieves the 
particular user sublanguage preference information from .the user ID 
files indexed to the particular user ID data and forwards it to the 
dictionary control module, and the dictionary control module selects 

5 the preferred sublanguage dictionary out of the plurality of 
sublanguage dictionaries that corresponds to the particular user 
sublanguage preference information for use by the machine translation 
module . 

2. A machine translation system according to Claim 1, 
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wherein said dictionary database contains a plurality of core language 
dictionaries corresponding respectively to a plurality of 
source/target languages for machine translation by said machine 
translation module, wherein said control input for each translation 
job includes a source/target languages control input indicative of a 
selected source/target core language applicable to the accompanying 
input text, and said dictionary control module is responsive to the 
source/ target languages control input identified by said receiving 
interface and causes said machine translation module to use a 
corresponding source/ target core language dictionary in performing 
translation of the input text. 

3. A machine translation system according to Claim 1, 
wherein said dictionary control module contains an inferencing program 
for selecting an applicable sublanguage dictionary based upon said 
sublanguage preference information indicating one or more parameters 
of a user's identity including title, sex, company, job position, 
address, user group, and subject matter. 

4 . A machine translation and telecommunications system 
comprising: 

(a) a receiving interface for receiving via a first 
telecommunications link an input text in a source language accompanied 
by a c ontrol input including a first predefined field containing an 
address of a receipient to receive output text translated to a target 
language and a second predefined field containing a source/target 
language control input indicative of a selected source/target language 
pair for translation applicable to the input text from among a 
plurality of possible source/target language pairs; 

(b) a machine translation module capable of performing 



- 48 - 



TLC-RE 



machine translation of an input text in a source language to an output 
text in a target language using a dictionary database containing 
entries for words of the target language corresponding to words of the 
source language; 

(c) a dictionary database containing a plurality of 
source /target language dictionaries, each containing entries for 
generic words for translation between a source and target language 
pair; 

(d) a dictionary control module responsive to the 
source /target language control input for selecting a source/target 
language dictionary of the dictionary database which is applicable to 
the input text, and for causing the machine translation module to use 
the selected source/target language dictionary in performing 
translation of the input text to the designated target language; and 

(e) an output module responsive to the address of the 
control input for outputting translated text in the target language 
generated by the machine translation module and automatically routing 
it to be sent to the recipients address. 

5. A machine translation and telecommunications system 
according to Claim 4, wherein the control input includes a third 
predefined field containing a sublanguage control input for selecting 
a sublanguage of a source/target language to be used for translation 
of the input text, said dictionary database containing a plurality of 
sublanguage dictionaries, each containing entries for specialized 
words of a sublanguage domain for translation within a source and 
target language pair, and said dictionary control module being 
responsive to the sublanguage control input for selecting a 
sublanguage dictionary of the dictionary database which is applicable 
to the input text, and for causing the machine translation module to 
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use the selected sublanguage dictionary in performing translation of 
the input text . 

6. A machine translation and telecommunications system 
according to Claim 4, further comprising a recognition module coupled 
to said receiving interface for electronically scanning the control 
input and recognizing the address of the recipient designated in the 
first predefined field of the control portion, and said output module 
including a sending interface for sending the translated output text 
generated by said machine translation module to the address of the 
recipient recognized by said recognition module. 

7 . A machine translation and telecommunications system 
according to Claim 4, wherein said receiving interface includes a 
programmed interaction module for interactive input from a user 
through a user interface to said system via said first 
telecommunications link. 

8. A machine translation and telecommunications system 
according to Claim 4, wherein said input text and translated output 
text are transmitted as electronic text, and said output module 
transmits the translated output text via a second telecommunications 
link. 

9. A machine translation and telecommunications system 
according to Claim 8, wherein said system is installed as a resident 
utility or server in a telecommunications system or network. 

10. A machine translation and telecommunications system 
according to Claim 8, wherein said receiving interface and said output 
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module include an electronic character code interface for receiving 
and sending electronic text in any of a plurality of electronic 
character coding conventions used by senders and recipients in 
different languages . 

11. A machine translation and telecommunications system 
according to Claim 4, wherein said input text is received as graphics 
data, and said receiving interface includes a character recognition 
module for converting text content contained in said graphics data to 
machine-readable electronic text. 

12. A machine translation and telecommunications system 
according to Claim 4, wherein said output module includes a graphics 
output module for converting machine-readable output text to graphics 
data for facsimile or other graphics transmission. 

13. A machine translation and telecommunications system 
according to Claim 4, wherein said output module is configured to send 
the translated output text together with the input text to the 
recipient's address to allow verifying of the translation. 

14. A machine translation and telecommunications system 
comprising : 

(a) a receiving interface for receiving via a first 
telecommunications link an input text in a source language accompanied 
by a control input including a first predefined field designating an 
address of a receipient to receive output text translated to a target 
language and a second predefined field containing a source/target 
language control input designating a selected source/target language 
pair for translation applicable to the input text from among a 
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plurality of source/target language pairs; 

(b) a machine translation module capable of performing 
machine translation of an input text in a source language to an output 
text in a target language, said module having a plurality of 
source/target language pair submodules any one of which can be 
operated for automatically performing machine translation in a 
selected source/target language pair; 

(c) a control module responsive to the source/target 
language control input of the second predefined field for selecting a 
source/target language pair submodule for operating said machine 
translation module to perform machine translation of the accompanying 
input text in the source/target language pair to the designated target 
language ; and 

(d) an output module responsive to the recipient's address 
of the first predefined field for outputting the translated text in 
the target language generated by the machine translation module and 
automatically routing it for sending to the recipient's address. 

15. A machine translation and telecommunications system 
according to Claim 14, wherein said source/target language pair 
submodules are provided with respective dictionary databases for 
performing translation in their respective source /target language 
pairs, and each dictionary database includes a plurality of 
sublanguage dictionaries containing entries for specialized words of 
respective sublanguage domains within the respective source /target 
language pair, wherein the control input includes a third predefined 
field containing a sublanguage control input for selecting a 
sublanguage domain of a source/target language pair that is preferred 
for translation of the accompanying input text, and wherein said 
control module is responsive to the sublanguage control input of the 
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third predefined field for selecting the preferred sublanguage 
dictionary of the dictionary database and operating said machine 
translation module to perform machine translation of the input text 
using the preferred sublanguage dictionary. 

16. A machine translation and telecommunications system 
according to Claim 14, wherein said input text and translated output 
text are transmitted as electronic text, and said output module 
transmits the translated output text via a second telecommunications 
link. 

17. A machine translation and telecommunications system 
according to Claim 16, wherein said system is installed as a resident 
utility or server in a telecommunications system or network. 

18. A method of automatically translating text from a 
source language into a target language and sending the translated 
output text to a designated recipient via a telecommunications network 
comprising the steps of: 

(a) sending input text as electronic text via a first 
telecommunications link to a resident utility or server on the 
telecommunications network, said electronic input text being in a 
source language and being accompanied by a control input ' including a 
first predefined field designating an address of a recipient 
addressable through the telecommunications network to receive output 
text translated to a target language and a second predefined field 
containing a source/target language control input designating a 
selected source/target language pair for translation applicable to the 
input text from among a plurality of source/target language pairs; 

(b) accessing at the resident utility or server a machine 
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translation module capable of performing machine translation of an 
input text in a source language to an output text in a target 
language, said machine translation module having a plurality of 
source/target language pair submodules any one of which can be 
operated for automatically performing machine translation in a 
selected source/target language pair, and said machine translation 
being responsive to the source/target language control input of the 
second predefined field for selecting a source/target language pair 
submodule for performing machine translation of the accompanying input 
text in the source/target language pair to the designated target 
language; and 

(c) automatically sending the translated output text as 
electronic text via a second telecommunications link through the 
telecommunications network to the recipient's address as designated in 
the first predefined field of the control input accompanying the input 
text . 

19. A method of automatically translating and sending text 
via a telecommunications network according to Claim 18, wherein the 
electronic text is transmitted as E-mail, transmitted to or from 
electronic bulletin boards, or transmitted to or from information 
service providers via the telecommunications network. 

20. A method of automatically translating and sending text 
via a telecommunications network according to Claim 18, further 
comprising the step of converting electronic input text sent to and 
from the resident utility or server in any of a plurality of 
electronic character coding conventions used by senders and recipients 
in different languages. 
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Abstract of the Disclosure 

A machine translation and telecommunications system includes 
a machine translation [engine] module for translation of input text 

[from a source language to a] between any of a plurality of source and 
target languages, using a dictionary database including a plurality of 
core [dictionary] dictionaries and a plurality of sublanguage (domain) 
dictionaries usable for translation from a source to a target 
language, a receiving interface for receiving text input from any of 
a plurality of users, each text input being accompanied by control 

[information including user ID data indicative of] input designating 
an address of a recipient of the translated output and a selected 
source /target language pair and/or one or more sublanguages preferred 

[by a particular user] for the translation , and an output interface 
for automatically routing the translated text to the recipient's 
address , [, and a dictionary control module coupled to the receiving 
interface responsive to the user ID data indicative of a sublanguage 
preference of a particular user for selecting a corresponding 
sublanguage dictionary of the dictionary database to be used by the 
machine translation engine along with the core dictionary for 
performing translation of the particular user's text input. User 
dictionaries can be maintained and selected to enhance translation 
accuracy in the same manner. The dictionary database encompassing 
core, sublanguage (domain) , and user dictionaries is cumulated for 
greater capability over time through the use of dictionary maintenance 
utilities for updating the dictionaries.] The system is particularly 
adaptable for sending and receiving electronic text to and from a 
resident utility or server which performs the translation and routing 
functions automatically on a telecommunications network. 
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FIG. 6 

COLLAPSING AND PROMOTION OF ENTRIES FROM SUBORDINATE TO SUBORDINATE DICTIONARIES 




PROMOTION 



DOMAIN "ELECTRONICS" DICTIONARY 



SUB-DOMAIN "COMPUTER" DICTIONARY 



Unouse: 

Gen'l char. A 

Gen'l char. B 

Gen'l char. C 

Gen'l char. D 

etc. 



PROMOTION 



PROMOTION 



SUB-DOMAIN "COMPUTER" 



User P 



Ljnouse: 

Gen'l char. A 

Gen'l char. B 

Gen'l char. C 

Gen'l char. D 

User char. X 
User char. Y 
User char. Z 

etc. 



DICTIONARY 



User Q 



Ljiouse: 

Gen'l char. A 

Gen'l char. B 

Gen'l char. C 

Gen'l char. D 

User char. L 
User char. M 
User char. N 

etc. 



User R 



Lmouse: 

Gen'l char. A 

Gen'l char. B 

Gen'l char. C 

Gen'l char. D 

User char. S 
User char. T 
User char. U 

etc. 
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FIG. 7b 

SCANNING PROCESS OF LEVELS OF DICTIONARIES BELOW 
AND ABOVE DOMAIN SUPPLIED TO INPUT EDITOR BY USER A 



CORE LANGUAGE PAIR DICTIONARY 
MOUSE? 




SUB-DOMAIN "COMPUTER 



USER D 
MOUSE? 




USER E 
MOUSE? 




USER F 
MOUSE? 









USER GROUP X DICTIONARY 



USER L 
MOUSE? 






USER M 
MOUSE? 




USER N 
MOUSE? 









12/13 



FIG. 7c 

USER C's ENTRIES RELEVANT TO "MOUSE" 



E_mouse: 
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group=comp 
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DATA WITH A MOUSE" 
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;* REISSUE APPLICATION DECLARATION BY THE INVENTOR 



Docket Number (Optional) 
TLC-RE 
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J 



As a below named inventor, I hereby declare that: 

My residence, post office address and citizenship are stated below next to my name, 
i believe I am the original, first and sole inventor (if only one name is listed below) or an original, first 
and joint inventor (if plural names are listed below) of the subject matter which is described and claimed 
in patent number USP 5,535,120 granted July 9, 1996 , anc j for which a 
reissue patent is sought on the invention entitled MACHINE TRANSLATION AND TELECOMMUNI- 
CATIONS SYSTEM USING USER ID DATA TO SELECT DICTIONARIES " 

the specification of which 
£2 is attached hereto. 

[] was filed on 



and was amended on 



. as reissue application number 



(If applicable) 

I have reviewed and understand the contents of the above identified specification, including the claims, 
as amended by any amendment referred to above. 

I acknowledge the duty to disclose information which is material to patentability as defined in 
37 CFR1.56. 

I verily believe the original patent to be wholly or partly inoperative or invalid, for the reasons described 
below. (Check all boxes that apply.) 

□ by reason of a defective specification or drawing. 

£J by reason of the patentee claiming more or less than he had the right to claim in the patent. 

□ by reason of other errors. 

At least one error upon which reissue is based is described as follows: 

The issued claims refer to the selection of dictionaries for 
machine translation by user ID data, but do not refer to selection 
of source/target languages by user ID data or by text routing infor 
mation, .which was within the main concept of the invention. Thus, 
the issued claims are believed to be insufficient. 
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Docket Number (Optional) 
T LC-RE 



(REISSUE APPLICATION DECLARATION BY THE INVENTOR, page 2) 

All errors corrected in this reissue application arose without any deceptive intention on the part of the 
applicant As a named inventor, I hereby appoint the following attorney(s) and/or agent(s) to prosecute 
this application and transact all business in the Patent and Trademark Office connected therewith. 



Name(s) 



Registration Number 



Leighton K. Choncr 



27,621 



Correspondence Address: Direct all communications about the application to: 
Customer Number ^ 

OR 



Type Customer Number here 



Place Customer Number Bar 
Code Label here 



rrrL Firm or 

5±h Individual Name 



Address 



Address 



Leighton K. Chong f Firm of Ostrager Chong Flaherty & 



841 Bishop str^t, Snif p 1 ?nn 



onotrio 



City 



Honolulu 



State 



HI 



ZIP 



9681 3-3908 



Country 



U.S.A. 



Telephone 



(808) 533-4300 



Fax 



(808) 531-75fiR 



I hereby declare that all statements made herein of my own knowledge are true and that all statements made 
on information and belief are believed to be true; and further that these statements were made with the 
knowledge that willful false statements and the like so made are punishable by fine and imprisonment, 
or both, under 18 U.S.C. 1001, and that such willful false statements may jeopardize the validity of the 
application, any patent issuing thereon, or any patent to which this declaration is directed. 



Full name of sole or first inventor (given name, family name) 
Leighton K. CHONG 



Inventor's signature 



Residence 

133 Kaai Street, Honolulu, HI 



Post Office Address 
( same ) 



k Date 



Citizenship 
U.S.A. 



Full name of second joint inventor (given name, family name) 
Christine K. K AMP RATH 




Invi 




>r& signature 



Residence 

100 Oakwood Park, Peoria, IL 



Date 



Citizensnip 
U.S. A 




2£ 



Post Office Address 
( same) 



Full name of third joint inventor (given name, family name) 



Inventor's signature 



Residence 



Date 



Citizenship 



Post Office Address 



I Additional joint inventors are named on separately numbered sheets attached hereto. 
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STATEMENT CLAIMING SMALL ENTITY STATUS 
(37 CFR 1.9(f) & 1.27(b))-INDEPENDENT INVENTOR 



Docket Number (Optional) 
TLC-RE 



Applicant Patentee, orldentifier L - CHONG f K. KAMPRATH 

Application or Patent No.: USP 5,535,120 (Reissue) 

Filedorlssued: Jul Y 9, 1998 



Title: MACHINE TRANSLATION AND TELECOMMUNICATIONS SYSTEM USING 
USER ID DATA TO SELECT DICTIONARIES 

As a below named inventor, I hereby state that I qualify as an independent inventor as defined in 37 CFR 19(c) 
for purposes of paying reduced fees to the Patent and Trademark Office described in: 

|~~| the specification filed herewith with title as listed above. 

["I the application identified above. 

jo^ the patent identified above. 

I have not assigned, granted, conveyed, or licensed, and am under no obligation under contract or law to assign, 
grant, convey, or license, any rights in the invention to any person who would not qualify as an independent inventor 
under 37 CFR 1 .9(c) if that person had made the invention, or to any concern which would not qualify as a small 
business concern under 37 CFR 1 .9(d) or a nonprofit organization under 37 CFR 1 .9(e). 

Each person, concern, or organization to which I have assigned, granted, conveyed, or licensed or am under an 
obligation under contract or law to assign, grant, convey, or license any rights in the invention is listed below: 

|~| No such person, concern, or organization exists. 

}jx] Each such person, concern, or organization is listed below. 

TRANS -LINK INTERNATIONAL CORP., a small entity, 



Separate statements are required from each named person, concern, or organization having rights to the invention 
stating their status as small entities. (37 CFR 1 .27) 

I acknowledge the duty to file, in this application or patent, notification of any change in status resulting in loss of 
entitlement to small entity status prior to paying, or at the time of paying, the earliest of the issue fee or any 
maintenance fee due after the date on which status as a small entity is no longer appropriate. (37 CFR 1 .28(b)) 



Leighton K. CHONG Christine K. KAMPRATH 



NAME OF INVENTOR 



NAME OF INVENTOR 



Signature of inventor Signature of inventor / 



NAME OF INVENTOR 



Signature of inventor 



Date 



Date 



Q^aH, Mi 



Date 



Burden Hour Statement: This form is estimated to take 0 2 hours to complete. Time will vary depending upon the needs of the individual case. Any 
comments on the amount of time you are required to complete this form should be sent to the Chief Information Officer, Patent and Trademark Office 
Washington. DC 20231 . DO NOT SEND FEES OR COMPLETED FORMS TO THIS ADDRESS. SEND TO: Assistant Commissioner for Patents, Washington' 
DC 20231. 
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STATEMENT CLAIMING SMALL ENTITY STATUS 

(37 CFR 1.9(f) & 1.27(c))-SMALL BUSINESS CONCERN 



Docket Number (Optional) 
TLC-RE 



Applicant Patentee, or Identifier L « CHQNG r g * KAMPRATH 
Application or Patent No. : USP b ,5Jb ,120 (Reissue) 
Filed or Issued: July 9, 1998 



Title: MACHINE TRA NSLATION AND TELECOMMUNICATIONS SYSTEM USING 

USER ID DATA TO SELECT DICTIONARIES 
I hereby state that i am 

O the owner of the small business concern identified beiow: 
XSP an official of the small business concern empowered to act on behalf of the concern identified beiow: 

NAME OF SMALL BUSINESS CONCERN TRANS -LINK INTERNATIONAL CORP . 



ADDRESSOF SMALL BUSINESS CONCERN ^3 Kaai St reet 

Honolul u, HI 9b82l 

I hereby state that the above identified small business concern qualifies as a small business concern as defined in 
13 CFR Part 121 for purposes of paying reduced fees to the United States Patent and Trademark Office, in that the number 
of employees of the concern, including those of its affiliates, does not exceed 500 persons. For purposes of this statement, 
(1) the number of employees of the business concern is the average overthe previous fiscal year of the concern of the persons 
employed on a full-time, part-time, or temporary basis during each of the pay periods of the fiscal year, and (2) concerns 
are affiliates of each other when either, directly or indirectly, one concern controls or has the power to control the other, or 
a third party or parties controls or has the power to control both. 

I hereby state that rights under contract or law have been conveyed to and remain with the small business concern 
identified above with regard to the invention described in: 

□ the specification filed herewith with title as listed above. 

□ the application identified above. 
XEP the patent identified above. 

If the rights held by the above identified small business concern are not exclusive, each individual, concern, or 
organization having rights in the invention must fife separate statements as to their status as small entities, and no rights 
to the invention are held by any person, other than the inventor, who would not qualify as an independent inventor under 
37 CFR 1.9(c) if that person made the invention, or by any concern which would not qualify as a small business concern 
under 37 CFR 1.9(d), or a nonprofit organization under 37 CFR 1.9(e). 

Each person, concern, or organization having any rights in the invention is listed below: 
no such person, concern, or organization exists. 

□ each such person, concern, or organization is listed below. 



Separate statements are required from each named person, concern or organization having rights to the invention 
stating their status as small entities. (37 CFR 1.27) 

I acknowledge the duty to file, in this application or patent, notification of any change in status resulting in loss of 
entitlement to small entity status prior to paying, or at the time of paying, the earliest of the issue fee or any maintenance 
fee due after the date on which status as a small entity is no longer appropnate. (37 CFR 1.28(b)) 



NAME OF PERSON SIGNING Leighton K. CHQNG 



TITLE OF PERSON IF OTHER THAN OWNER President 
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REISSUE APPLICATION BY THE INVENTOR, 
OFFER TO SURRENDER PATENT 



Docket Number (Optional) 



This is part of the application for a reissue patent based on the original patent identified below. 



Name of Patentee(s) 




Leiqhton K. CHONG, Christine K 


. KAMPRATH 


Patent Number 


Date Patent Issued 


USP 5,535,120 


July 9, 1998 


Title of Invention 




MACHINE TRANSLATION AND TELECOMMUNICATIONS SYSTEM USING USER . , 



I am the inventor of the original patent. 
I offer to surrender the original patent. 

1 £2j Filed herein is a certificate under 37 CFR 3.73(b). 

2 rn Ownership of the patent is in the inventor(s), and no assignment of the patent has 

been made. 

One of boxes 1 or 2 above must be checked. 

The written consent of all assignees owning an undivided interest in the original patent is included in 
this application for reissue. 



Signature^ 


Date 


Typed or printed name ^ 




Leigh ton K. CHONG 




The assignee owning an undivided interest in said original patent is TRANS-LINK INT 1 L CORP 
and the assignee consents to the accompanying application for reissue. 



hereby declare that all statements made herein of my own knowledge are true and that all 
statements made on information and belief are believed to be true; and further that these statements 
were made with the knowledge that willful false statements and the like so made are punishable by 
fine or imprisonment, or both, under 18 U.S.C. 1001 and that such willful false statements may 
jeopardize the validity of the application, any patent issued thereon, or any patent to which this 
declaration is directed. 



Name of assignee 



TRANS -LINK INTERNATIONAL CORP. 



Signature of person signing for assignee 




Typed or printed name and title of person signing for assignee 
Leighton K. Chong, President 



Date 



Burden Hour Statement: This form is estimated to take 0.1 hours to complete. Time will vary depending upon the needs of the 
individual case. Any comments on the amount of time you are required to complete this form should be sent to the Chief 
To 0 Tu^°& n nfe/ Washington, DC 20231. DO NOT SEND FEES OR COMPLETED FORMS 

TO THIS ADDRESS. SEND TO: Assistant Commissioner for Patents, Washington, DC 20231 
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REISSUE APPLICATION BY THE INVENTOR, 
OFFER TO SURRENDER PATENT 



Docket Number (Optional) 



This is part of the application for a reissue patent based on the original patent identified below. 



Name of Patentee(s) 




Leiqhton K. CHONG, Christine K 


. KAMPRATH 


Patent Number 


Date Patent Issued 


USP 5,535,120 


July 9, 1998 


Title of Invention 




MACHINE TRANSLATION AND TELECOMMUNICATIONS SYSTEM USING USER . , 



I am the inventor of the original patent. 
I offer to surrender the original patent. 

1 . ^ Filed herein is a certificate under 37 CFR 3.73(b). 

2 r~j Ownership of the patent is in the inventor(s), and no assignment of the patent has 
been made. 

One of boxes 1 or 2 above must be checked. 

The written consent of ail assignees owning an undivided interest in the original patent is included in 
this application for reissue. 



Signature. / / / 




Date /\ , H 


Typed or printed name ' " 

Christine K. KAMPRATH 


The assignee owning an undivided interest in said original p 
and the assignee consents to the accompanying application 


atent is TRANS -LINK INT 1 L CORP., 


for reissue. 



I hereby declare that all statements made herein of my own knowledge are true and that all 
statements made on information and belief are believed to be true; and further that these statements 
were made with the knowledge that willful false statements and the like so made are punishable by 
fine or imprisonment, or both, under 18 U.S.C. 1001 and that such willful false statements may 
jeopardize the validity of the application, any patent issued thereon, or any patent to which this 
declaration is directed. 



Name of assignee 

TRANS-LINK INTERNATIONAL CORP. 



Signature mi person signing for assignee 




c 




Date 



Typed or printed name and title of personrstgning for assignee 
Leighton K. Chong, President 



Burden Hour Statement: This form is estimated to take 0.1 hours to complete. Time will vary depending upon the needs of the 
individual case. Any comments on the amount of time you are required to complete this form should be sent to the Chief 
Information Officer, Patent and Trademark Office, Washington.DC 20231. DO NOT SEND FEES OR COMPLETED FORMS 
TO THIS ADDRESS. SEND TO: Assistant Commissioner for Patents, Washington, DC 20231. 



IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 



. Atty. #: TLC-RE 

In Re Application For Reissue Of 

: Examiner : 

CHONG 

: Group No. : 

For U.S. Patent 5,535,120 iss. July 9, 1996 

Filed: (Concurrently Herewith) 

Title: MACHINE TRANSLATION AND TELECOMMUN- 
ICATIONS SYSTEM : 



ORDER FOR A TITLE REPORT 

Assistant Commissioner for Patents 
U.S. Patent & Trademark Office 
Washington, D.C. 2 0231 

Sir: 

With the filing of the above-identified application for 
reissue, applicant requests that a title report for the patent be 
ordered and placed in the file, pursuant to 37 C.F.R. 1.171. 

Authorization is hereby given to charge our Deposit 
Account No. 15-0699 for the title report fee. 



Respectfully submitted, 
ATTORNEYS FOR APPLICANT 

Leighton K. Chong 

OSTRAGER CHONG FLAHERTY & ONOFRIO 
841 Bishop Street, Suite 1200 
Honolulu, HI 96813-3908 
Tel: (808) 533-4300 (H.S.T.) 
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